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CLAIMS 

1 . (Original) A method for computing a measure of similarity between a first (or 
input) document and a second (or search results) document, comprising: 

(a) receiving a first list of rated keywords extracted from the first document and a 
second list of rated keywords extracted from the second document; 

(b) using the first and second lists of rated keywords to determine whether the first 
document forms part of the second document using a first computed percentage indicating 
what percentage of keyword ratings in the first list also exist in the second list; 

(c) computing a second percentage indicating what percentage of keyword ratings 
along with a set of their neighboring keyword ratings in the first list also exist in the second 
list when the first computed percentage indicates that the first document is included in the 
second document; 

(d) using the first computed percentage to specify the measure of similarity when the 
second computed percentage is greater than the first computed percentage. 

2. (Original) The method according to claim 1 , wherein the second percentage 
at (c) is computed by giving weight only to those keywords and their set of neighboring 
keywords in the first list that match in the second list and a threshold percentage of the 
keywords in their set of neighboring keywords. 

3. (Original) The method according to claim 2, wherein the second percentage 
at (c) is computed by giving full weight to those keywords in the first list of rated keywords 
that cannot be accurately identified as having a complete set of neighboring keywords in 
the second set of keywords. 

4. (Original) The method according to claim 2, wherein the threshold percentage 
is reduced when the first list of rated keywords is identified using OCR. 

5. (Original) The method according to claim 1 , further comprising (e) if the first 
computed percentage does not indicate that the first document is included in the second 
document, computing a third percentage using the Jaccard distance measure. 
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6. (Original) The method according to claim 5, further comprising (f) if the third 
computed percentage indicates that the first document is a revision of the second 
document, computing a fourth percentage indicating what percentage of keyword ratings 
along with a set of their neighboring keyword ratings in the second list also exist in the first 
list. 

7. (Original) The method according to claim 6, further comprising using the 
fourth computed percentage to specify the measure of similarity except when: (i) the fourth 
computed percentage is greater than the second computed percentage; (ii) the first list of 
rated keywords is identified using OCR; (iii) the fourth computed percentage is greater than 
fifty percent; and (iv) less than twenty percent of the keywords in the first list of keywords 
are in the second list of keywords. 

8. (Original) The method according to claim 1, wherein the first computed 
percentage indicates that the first document is included in the second document when the 
percentage defined by ratio of Sum1/Sum2 is greater than approximately ninety percent, 
where: D1 is the numberof keywords in first list of keywords; D2 isthe number of keywords 
in the second list of keywords; Sum1 is the sum of the weights of keywords that appear in 
D1 that also appear in D2; Sum2 is the sum of the weights of keywords in D1. 

9. (Original) The method according to claim 1, wherein the first list of rated 
keywords includes one or more keywords translated from a second language different from 
a first language that is identified as being a primary language of the first document. 

10. (Original) The method according to claim 1, wherein the first document is a 
portion of the second document. 



5 



Atty. Dkt. No. A3358Q-US-NP 

XERZ2 01374 

1 1 . (Currently amended) A computer-based system for computing a measure of 
similarity between a first (or input) document and a second (or search results) document, 
comprising: 

(a) means for receiving a first list of rated keywords extracted from the first 
document and a second list of rated keywords extracted from the second document; 

(b) means for using the first and second lists of rated keywords to determine 
whether the first document forms part of the second document using a first computed 
percentage indicating what percentage of keyword ratings in the first list also exist in the 
second list; 

(c) means for computing a second percentage indicating what percentage of 
keyword ratings along with a set of their neighboring keyword ratings in the first list also 
exist in the second list when the first computed percentage indicates that the first 
document is included in the second document: and 

(d) means for using the first computed percentage to specify the measure of 
similarity when the second computed percentage is greater than the first computed 
percentage. 

1 2. (Original) The system according to claim 1 1 . wherein the second percentage 
at (c) is computed by said computing means by giving weight only to those keywords and 
their set of neighboring keywords in the first list that match in the second list and a 
threshold percentage of the keywords in their set of neighboring keywords. 

13. (Original) The system according to claim 12, wherein the second percentage 
at (c) is computed by said computing means by giving full weight to those keywords in the 
first list of rated keywords that cannot be accurately identified as having a complete set of 
neighboring keywords in the second set of keywords. 

14. (Original) The system according to claim 12, wherein the threshold 
percentage is reduced when the first list of rated keywords is identified using OCR. 
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15. (Original) The system according to claim 1 1 , further comprising 

(e) if the first computed percentage does not indicate that the first document is 
included in the second document, means computes a third percentage using the Jaccard 
distance measure. 

16. (Original) The system according to claim 15, further comprising (f) if the third 
computed percentage indicates that the first document is a revision of the second 
document, means computes a fourth percentage indicating what percentage of keyword 
ratings along with a set of their neighboring keyword ratings in the second list also exist in 
the first list. 

17. (Original) The system according to claim 16, further comprising means for 
using the fourth computed percentage to specify the measure of similarity except when: (i) 
the fourth computed percentage is greater than the second computed percentage; (ii) the 
first list of rated keywords is identified using OCR; (iii) the fourth computed percentage is 
greater than fifty percent; and (iv) less than twenty percent of the keywords in the first list of 
keywords are in the second list of keywords. 

18. (Original) The system according to claim 11, wherein the first computed 
percentage indicates that the first document is included in the second document when the 
percentage defined by ratio of Sum1/Sum2 is greater than approximately ninety percent, 
where: D1 is the numberof keywords in first list of keywords; D2 is the number of keywords 
in the second list of keywords; Sumi is the sum of the weights of keywords that appear in 
D1 that also appear in D2; Sum2 is the sum of the weights of keywords in D1 . 

19. (Original) The system according to claim 11, wherein the first list of rated 
keywords includes one or more keywords translated from a second language different from 
a first language that is identified as being a primary language of the first document. 
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20. (Original) An article of manufacture for computing a measure of similarity 
between a first (or input) document and a second (or search results) document, the article 
of manufacture comprising computer usable media including computer readable 
instructions embedded therein that causes a computer to perform a method, wherein the 
method comprises: 

(a) receiving a first list of rated keywords extracted from the first document and a 
second list of rated keywords extracted from the second document; 

(b) using the first and second lists of rated keywords to determine whether the first 
document forms part of the second document using a first computed percentage indicating 
what percentage of keyword ratings in the first list also exist in the second list; 

(c) computing a second percentage indicating what percentage of keyword ratings 
along with a set of their neighboring keyword ratings in the first list also exist in the second 
list when the first computed percentage indicates that the first document is included in the 
second document; 

(d) using the first computed percentage to specify the measure of similarity when the 
second computed percentage is greater than the first computed percentage. 
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